Stochastic Bandits with Pathwise Constraints
ثبت نشده
چکیده
We consider the problem of stochastic bandits, with the goal of maximizing a reward while satisfying pathwise constraints. The motivation for this problem comes from cognitive radio networks, in which agents need to choose between different transmission profiles to maximize throughput under certain operational constraints such as limited average power. Stochastic bandits serve as a natural model for an unknown, stationary environment. We propose an algorithm, based on a steering approach, and analyze its regret with respect to the optimal stationary policy that knows the statistics of the different arms.
منابع مشابه
Stochastic functional population dynamics with jumps
In this paper we use a class of stochastic functional Kolmogorov-type model with jumps to describe the evolutions of population dynamics. By constructing a special Lyapunov function, we show that the stochastic functional differential equation associated with our model admits a unique global solution in the positive orthant, and, by the exponential martingale inequality with jumps, we dis...
متن کاملOn Bayesian Upper Confidence Bounds for Bandit Problems
Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the parameter is drawn from a prior distribution. We show in this paper that methods derived from this second perspective prove optimal when evaluated using the frequentist cumulated regret as a measure of perf...
متن کاملContinuous dependence on coefficients for stochastic evolution equations with multiplicative Levy Noise and monotone nonlinearity
Semilinear stochastic evolution equations with multiplicative L'evy noise are considered. The drift term is assumed to be monotone nonlinear and with linear growth. Unlike other similar works, we do not impose coercivity conditions on coefficients. We establish the continuous dependence of the mild solution with respect to initial conditions and also on coefficients. As corollaries of ...
متن کاملAn algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is O ( K √ n log n ) and against stochastic bandits the pseudo-regret is O ( ∑ i(log n)/∆i). We also show that no algorithm with O (log n) pseudo-regret against stochastic bandits can achieve Õ ( √ n) expected regret against adaptive...
متن کاملA Survey on Contextual Multi-armed Bandits
4 Stochastic Contextual Bandits 6 4.1 Stochastic Contextual Bandits with Linear Realizability Assumption . . . . 6 4.1.1 LinUCB/SupLinUCB . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.1.2 LinREL/SupLinREL . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.3 CofineUCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.1.4 Thompson Sampling with Linear Payoffs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011